117 research outputs found
Presentation in Free-Form Space: Managing Ambiguity with Hypermedia Pathways While Supporting Ideation
Traditional Slideware presentation tools (e.g. PowerPoint) suffer from the problem of premature formalism, which interferes with how authors develop new knowledge. Free-form spatial content organization can overcome this problem, by allowing users to express multiple, emerging relationships among content elements. Although its ambiguity fosters interpretation of relationships for both authors and audiences, the ambiguity will make presentation more challenging to perform. Therefore, we integrated hypermedia pathways with a free-form space to support presentations. We conducted a field study, addressing 158 users to understand authors’ experiences of creating content in free-form space, integrated with hypermedia pathways for presentation. Our findings show that this integration supports users in not only developing new ideas, but also in performing the presentations
An Information-Theoretic Framework for Evaluating Edge Bundling Visualization
Edge bundling is a promising graph visualization approach to simplifying the visual result of a graph drawing. Plenty of edge bundling methods have been developed to generate diverse graph layouts. However, it is difficult to defend an edge bundling method with its resulting layout against other edge bundling methods as a clear theoretic evaluation framework is absent in the literature. In this paper, we propose an information-theoretic framework to evaluate the visual results of edge bundling techniques. We first illustrate the advantage of edge bundling visualizations for large graphs, and pinpoint the ambiguity resulting from drawing results. Second, we define and quantify the amount of information delivered by edge bundling visualization from the underlying network using information theory. Third, we propose a new algorithm to evaluate the resulting layouts of edge bundling using the amount of the mutual information between a raw network dataset and its edge bundling visualization. Comparison examples based on the proposed framework between different edge bundling techniques are presented
Multi-Speaker Multi-Lingual VQTTS System for LIMMITS 2023 Challenge
In this paper, we describe the systems developed by the SJTU X-LANCE team for
LIMMITS 2023 Challenge, and we mainly focus on the winning system on
naturalness for track 1. The aim of this challenge is to build a multi-speaker
multi-lingual text-to-speech (TTS) system for Marathi, Hindi and Telugu. Each
of the languages has a male and a female speaker in the given dataset. In track
1, only 5 hours data from each speaker can be selected to train the TTS model.
Our system is based on the recently proposed VQTTS that utilizes VQ acoustic
feature rather than mel-spectrogram. We introduce additional speaker embeddings
and language embeddings to VQTTS for controlling the speaker and language
information. In the cross-lingual evaluations where we need to synthesize
speech in a cross-lingual speaker's voice, we provide a native speaker's
embedding to the acoustic model and the target speaker's embedding to the
vocoder. In the subjective MOS listening test on naturalness, our system
achieves 4.77 which ranks first.Comment: Accepted by ICASSP 2023 Special Session for Grand Challenge
Matrix GARCH Model: Inference and Application
Matrix-variate time series data are largely available in applications.
However, no attempt has been made to study their conditional heteroskedasticity
that is often observed in economic and financial data. To address this gap, we
propose a novel matrix generalized autoregressive conditional
heteroskedasticity (GARCH) model to capture the dynamics of conditional row and
column covariance matrices of matrix time series. The key innovation of the
matrix GARCH model is the use of a univariate GARCH specification for the trace
of conditional row or column covariance matrix, which allows for the
identification of conditional row and column covariance matrices. Moreover, we
introduce a quasi maximum likelihood estimator (QMLE) for model estimation and
develop a portmanteau test for model diagnostic checking. Simulation studies
are conducted to assess the finite-sample performance of the QMLE and
portmanteau test. To handle large dimensional matrix time series, we also
propose a matrix factor GARCH model. Finally, we demonstrate the superiority of
the matrix GARCH and matrix factor GARCH models over existing multivariate
GARCH-type models in volatility forecasting and portfolio allocations using
three applications on credit default swap prices, global stock sector indices,
and future prices
A Novel LiDAR-Based Instrument for High-Throughput, 3D Measurement of Morphological Traits in Maize and Sorghum
Recently, imaged-based approaches have developed rapidly for high-throughput plant phenotyping (HTPP). Imaging reduces a 3D plant into 2D images, which makes the retrieval of plant morphological traits challenging. We developed a novel LiDAR-based phenotyping instrument to generate 3D point clouds of single plants. The instrument combined a LiDAR scanner with a precision rotation stage on which an individual plant was placed. A LabVIEW program was developed to control the scanning and rotation motion, synchronize the measurements from both devices, and capture a 360â—¦ view point cloud. A data processing pipeline was developed for noise removal, voxelization, triangulation, and plant leaf surface reconstruction. Once the leaf digital surfaces were reconstructed, plant morphological traits, including individual and total leaf area, leaf inclination angle, and leaf angular distribution, were derived. The system was tested with maize and sorghum plants. The results showed that leaf area measurements by the instrument were highly correlated with the reference methods (R2 \u3e 0.91 for individual leaf area; R2 \u3e 0.95 for total leaf area of each plant). Leaf angular distributions of the two species were also derived. This instrument could fill a critical technological gap for indoor HTPP of plant morphological traits in 3D
Towards Universal Speech Discrete Tokens: A Case Study for ASR and TTS
Self-supervised learning (SSL) proficiency in speech-related tasks has driven
research into utilizing discrete tokens for speech tasks like recognition and
translation, which offer lower storage requirements and great potential to
employ natural language processing techniques. However, these studies, mainly
single-task focused, faced challenges like overfitting and performance
degradation in speech recognition tasks, often at the cost of sacrificing
performance in multi-task scenarios. This study presents a comprehensive
comparison and optimization of discrete tokens generated by various leading SSL
models in speech recognition and synthesis tasks. We aim to explore the
universality of speech discrete tokens across multiple speech tasks.
Experimental results demonstrate that discrete tokens achieve comparable
results against systems trained on FBank features in speech recognition tasks
and outperform mel-spectrogram features in speech synthesis in subjective and
objective metrics. These findings suggest that universal discrete tokens have
enormous potential in various speech-related tasks. Our work is open-source and
publicly available to facilitate research in this direction
\u3ci\u3ePhenoImage\u3c/i\u3e: An open-source graphical user interface for plant image analysis
High-throughput genotyping coupled with molecular breeding approaches have dramatically accelerated crop improvement programs. More recently, improved plant phenotyping methods have led to a shift from manual measurements to automated platforms with increased scalability and resolution. Considerable effort has also gone into developing large-scale downstream processing of the imaging datasets derived from high-throughput phenotyping (HTP) platforms. However, most available tools require some programming skills.We developed PhenoImage, an open-source graphical user interface (GUI) based cross-platform solution for HTP image processing intending to make image analysis accessible to users with either little or no programming skills. The open-source nature provides the possibility to extend its usability to meet user-specific requirements. The availability of multiple functions and filtering parameters provides flexibility to analyze images from a wide variety of plant species and platforms. PhenoImage can be run on a personal computer as well as on high-performance computing clusters. To test the efficacy of the application, we analyzed the LemnaTec Imaging system derived red, green, and blue (RGB) color intensity and plant pigmentation-based fluorescence shoot images from two plant species: sorghum [Sorghum bicolor (L.) Moench] and wheat (Triticum aestivum L.) differing in their physical attributes. In the study, we discuss the development, implementation, and working of the PhenoImage
Effects of Coronal Magnetic Field Configuration on Particle Acceleration and Release during the Ground Level Enhancement Events in Solar Cycle 24
Ground level enhancements (GLEs) are extreme solar energetic particle (SEP)
events that are of particular importance in space weather. In solar cycle 24,
two GLEs were recorded on 2012 May 17 (GLE 71) and 2017 September 10 (GLE 72),
respectively, by a range of advanced modern instruments. Here we conduct a
comparative analysis of the two events by focusing on the effects of
large-scale magnetic field configuration near active regions on particle
acceleration and release. Although the active regions both located near the
western limb, temporal variations of SEP intensities and energy spectra
measured in-situ display different behaviors at early stages. By combining a
potential field model, we find the CME in GLE 71 originated below the streamer
belt, while in GLE 72 near the edge of the streamer belt. We reconstruct the
CME shock fronts with an ellipsoid model based on nearly simultaneous
coronagraph images from multi-viewpoints, and further derive the 3D shock
geometry at the GLE onset. The highest-energy particles are primarily
accelerated in the shock-streamer interaction regions, i.e., likely at the nose
of the shock in GLE 71 and the eastern flank in GLE 72, due to
quasi-perpendicular shock geometry and confinement of closed fields.
Subsequently, they are released to the field lines connecting to near-Earth
spacecraft when the shocks move through the streamer cusp region. This suggests
that magnetic structures in the corona, especially shock-streamer interactions,
may have played an important role in the acceleration and release of the
highest-energy particles in the two events.Comment: Accepted for publication in Ap
UniCATS: A Unified Context-Aware Text-to-Speech Framework with Contextual VQ-Diffusion and Vocoding
The utilization of discrete speech tokens, divided into semantic tokens and
acoustic tokens, has been proven superior to traditional acoustic feature
mel-spectrograms in terms of naturalness and robustness for text-to-speech
(TTS) synthesis. Recent popular models, such as VALL-E and SPEAR-TTS, allow
zero-shot speaker adaptation through auto-regressive (AR) continuation of
acoustic tokens extracted from a short speech prompt. However, these AR models
are restricted to generate speech only in a left-to-right direction, making
them unsuitable for speech editing where both preceding and following contexts
are provided. Furthermore, these models rely on acoustic tokens, which have
audio quality limitations imposed by the performance of audio codec models. In
this study, we propose a unified context-aware TTS framework called UniCATS,
which is capable of both speech continuation and editing. UniCATS comprises two
components, an acoustic model CTX-txt2vec and a vocoder CTX-vec2wav.
CTX-txt2vec employs contextual VQ-diffusion to predict semantic tokens from the
input text, enabling it to incorporate the semantic context and maintain
seamless concatenation with the surrounding context. Following that,
CTX-vec2wav utilizes contextual vocoding to convert these semantic tokens into
waveforms, taking into consideration the acoustic context. Our experimental
results demonstrate that CTX-vec2wav outperforms HifiGAN and AudioLM in terms
of speech resynthesis from semantic tokens. Moreover, we show that UniCATS
achieves state-of-the-art performance in both speech continuation and editing
PI‑Plat: a high‑resolution image‑based 3D reconstruction method to estimate growth dynamics of rice inflorescence traits
Background: Recent advances in image-based plant phenotyping have improved our capability to study vegetative stage growth dynamics. However, more complex agronomic traits such as inflorescence architecture (IA), which predominantly contributes to grain crop yield are more challenging to quantify and hence are relatively less explored. Previous efforts to estimate inflorescence-related traits using image-based phenotyping have been limited to destructive end-point measurements. Development of non-destructive inflorescence phenotyping platforms could accelerate the discovery of the phenotypic variation with respect to inflorescence dynamics and mapping of the underlying genes regulating critical yield components.
Results: The major objective of this study is to evaluate post-fertilization development and growth dynamics of inflorescence at high spatial and temporal resolution in rice. For this, we developed the Panicle Imaging Platform (PI-Plat) to comprehend multi-dimensional features of IA in a non-destructive manner. We used 11 rice genotypes to capture multi-view images of primary panicle on weekly basis after the fertilization. These images were used to reconstruct a 3D point cloud of the panicle, which enabled us to extract digital traits such as voxel count and color intensity. We found that the voxel count of developing panicles is positively correlated with seed number and weight at maturity. The voxel count from developing panicles projected overall volumes that increased during the grain filling phase, wherein quantification of color intensity estimated the rate of panicle maturation. Our 3D based phenotyping solution showed superior performance compared to conventional 2D based approaches.
Conclusions: For harnessing the potential of the existing genetic resources, we need a comprehensive understanding of the genotype-to-phenotype relationship. Relatively low-cost sequencing platforms have facilitated high-throughput genotyping, while phenotyping, especially for complex traits, has posed major challenges for crop improvement. PI-Plat offers a low cost and high-resolution platform to phenotype inflorescence-related traits using 3D reconstruction-based approach. Further, the non-destructive nature of the platform facilitates analyses of the same panicle at multiple developmental time points, which can be utilized to explore the genetic variation for dynamic inflorescence traits in cereals
- …